---
title: Challengers tab
description: How to use the Challengers tab to submit challenger models that shadow a deployed model and replay predictions made against the deployed model. If a challenger outperforms the deployed model, you can replace the model.

---

# Challengers tab {: #challengers-tab }

!!! info "Availability information"
    The **Challengers** tab is a feature exclusive to DataRobot MLOps users. Contact your DataRobot representative for information on enabling it.

During model development, many models are often compared to one another until one is chosen to be deployed into a production environment. The **Challengers** tab provides a way to continue model comparison post-deployment. You can submit challenger models that shadow a deployed model and replay predictions made against the deployed model. This allows you to compare the predictions made by the challenger models to the currently deployed model (the "champion") to determine if there is a superior DataRobot model that would be a better fit.

##  Enable challenger models {: #enable-challenger-models }

To enable challenger models for a deployment, you must enable the **Challengers** tab and [prediction row storage](challengers-settings). To do so, configure the deployment's data drift settings either when [creating a deployment](add-deploy-info#challenger-analysis) or on the [**Challengers > Settings**](challengers-settings) tab. If you enable Challenger models, prediction row storage is automatically enabled for the deployment. It cannot be turned off, as it is required for challengers.

!!! info "Availability information"
    To enable challengers and replay predictions against them, the deployed model must support target drift tracking *and* not be a [Feature Discovery](fd-overview) or [Unstructured custom inference](unstructured-custom-models) model.

![](images/challenge-1.png)

##  Select a challenger model {: #select-a-challenger-model }

Before adding a challenger model to a deployment, you must first build and select the model to be added as a challenger. Complete the [modeling process](model-data) and choose a model from the Leaderboard, or deploy a [custom model](reg-create#add-a-custom-inference-model) as a model package. When selecting a challenger model, consider the following:

* It *must* have the same target type as the champion model.
* It *cannot* be the same Leaderboard model as an existing champion or challenger; each challenger *must* be a unique model. If you create multiple model packages from the same Leaderboard model, you can't use those models as challengers in the same deployment.
* It *cannot* be a Feature Discovery model.
* It *does not* need to be trained on the same feature list as the champion model; however, it *must* share some features, and, to successfully [replay predictions](#replay-predictions), you *must* send the union of all features required for champion and challengers.
* It *does not* need to be built from the same project as the champion model.

When you have selected a model to serve as a challenger, from the Leaderboard, navigate to **Predict** > **Deploy** and select **Add model package to registry**. This creates a [model package](reg-create) for the selected model in the [**Model Registry**](registry/index), so you can add the model to a deployment as a challenger.

![](images/challenge-2.png)

##  Add challengers to a deployment {: #add-challengers-to-a-deployment }

To add a challenger model to a deployment, navigate to the **Challengers** tab and select **Add challenger model > Select existing model**. You can add up to 5 challengers to each deployment.

![](images/add-challenger.png)

!!! note
    The selection list contains only model packages where the target type and name are the same as the champion model.

The modal prompts you to select a model package from the registry to serve as a challenger model. Choose the model to add and click **Select model package**.

DataRobot verifies that the model shares features and a target type with the champion model. Once verified, click **Add Challenger**. The model is now added to the deployment as a challenger.

![](images/challenge-4.png)

##  Replay predictions {: #replay-predictions }

After adding a challenger model, you can replay stored predictions made with the champion model for all challengers, allowing you to compare performance metrics such as predicted values, accuracy, and data errors across each model.

To replay predictions, select **Update challenger predictions**.

![](images/challenge-5.png)

??? "Organization considerations"
    If you aren't in the [Organization](admin-overview#what-are-organizations) associated with the deployment, you don't have the required permissions to replay predictions against challenger models. This restriction also applies to deployment [Owners](roles-permissions#deployment-roles).

The champion model computes and stores up to 100,000 prediction rows per hour. The challengers replay the first 10,000 rows of the prediction requests made for each hour within the time range specified by the [date slider](data-drift#use-the-time-range-and-resolution-dropdowns). Note that for time series deployments, this limit does not apply. All prediction data is used by the challengers to compare statistics.

After predictions are made, click **Refresh** on the date slider to view an updated display of [performance metrics](#challenger-performance-metrics) for the challenger models.

![](images/challenge-6.png)

## Schedule prediction replay {: #schedule-prediction-replay }

You can replay predictions with challengers on a periodic schedule instead of doing so manually. Navigate to a deployment's [**Challengers > Settings**](challengers-settings) tab, enable the **Automatically replay challengers** toggle, and configure the preferred cadence and time of day for replaying predictions:

![](images/challenge-13.png)

!!! note
    Only the deployment [_Owner_](roles-permissions#deployment-roles) can schedule challenger replay.

Once enabled, the replay will trigger at the configured time for all challengers. Note that if you have a deployment with prediction requests made in the past and choose to add challengers at the current time, the scheduled job scores the newly added challenger models upon the next run cycle.

## View challenger job history {: #view-challenger-job-history }

After adding one or more challenger models and replaying predictions, you can view challenger prediction jobs for a deployment's challengers on the [Deployments > Prediction Jobs](batch-pred-jobs#manage-prediction-jobs) page.

To view challenger prediction jobs, click **Job History**.

![](images/challenger-job-history.png)

The Prediction Jobs page opens and is filtered to display challenger jobs for the deployment you accessed the Job History from.

##  Challenger models overview {: #challenger-models-overview }

The **Challengers** tab displays information about the champion model and each challenger.

![](images/challenge-11.png)

| | Element | Description |
|---|---|---|
|![](images/icon-1.png)| Display Name | The display name for each model. Use the pencil icon to edit the display name. This field is useful for describing the purpose or strategy of each challenger (e.g., "reference model," "former champion," "reduced feature list").|
|![](images/icon-2.png)| Challenger models | The list of challenger models. Each model is associated with a color. These colors allow you to compare the models using visualization tools.|
|![](images/icon-3.png)| Model data | The metadata for each model, including the project name, model name, and the execution environment type.|
|![](images/icon-4.png)| Prediction Environment | The external environment the model uses to manage deployment predictions on a system outside of DataRobot. For more information, see [Prediction environments](pred-env).|
|![](images/icon-5.png)| Accuracy | The model's accuracy metric calculation for the selected date range and, for challengers, a comparison with the champion's accuracy metric calculation. Use the **Accuracy metric** dropdown menu to compare different metrics. For more information on model accuracy, see the [Accuracy chart](challengers#accuracy-chart).|
|![](images/icon-6.png)| Training Data | The filename of the data used to train the model.|
|![](images/icon-7.png)| Actions | The actions available for each model:<ul><li>**Replace**: Promotes a challenger to the champion (the currently deployed model) and demotes the current champion to a challenger model. </li><li>**Remove**: Removes the model from the deployment as a challenger. Only challengers can be deleted; a champion must be demoted before it can be deleted.</li></ul>|

###  Challenger performance metrics {: #challenger-performance-metrics }

After prediction data is replayed for challenger models, you can examine the charts displayed below that capture the various performance metrics recorded for each model.

Each model is listed with its corresponding color. Uncheck a model's box to stop displaying the model's performance data on the charts.

![](images/challenge-7.png)

####  Predictions chart {: #predictions-chart }

The Predictions chart records the average predicted value of the target for each model over time. Hover over a point to compare the average value for each model at a specific point in time.

![](images/challenge-8.png)

For binary classification projects, use the **Class** dropdown to select the class for which you want to analyze the average predicted values. The chart also includes a toggle that allows you to switch between continuous and binary modes. Continuous mode shows the positive class predictions as probabilities between 0 and 1 without taking the prediction threshold into account. Binary mode takes the prediction threshold into account and shows, for all predictions made, the percentage for each possible class.

####  Accuracy chart {: #accuracy-chart }

The Accuracy chart records the change in a selected [accuracy](deploy-accuracy) metric value (LogLoss in this example) over time. These metrics are identical to those used for the evaluation of the model before deployment. Use the dropdown to change the accuracy metric. You can select from [any of the supported metrics](deploy-accuracy#available-accuracy-metrics) for the deployment's modeling type.

!!! important
    You must [set an association ID](accuracy-settings#select-an-association-id) _before_ making predictions to include those predictions in accuracy tracking.

![](images/challenge-9.png)

####  Data Errors chart {: #data-errors-chart }

The Data Errors chart records the [data error rate](service-health) for each model over time. Data error rate measures the percentage of requests that result in a 4xx error (problems with the prediction request submission).

![](images/challenge-10.png)

## Challenger model comparisons {: #challenger-model-comparisons }

MLOps allows you to compare challenger models against each other and against the currently deployed model (the "champion") to ensure that your deployment uses the best model for your needs. After evaluating DataRobot's model comparison visualizations, you can replace the champion model with a better-performing challenger.

DataRobot renders visualizations based on a dedicated comparison dataset, which you select, ensuring that you're comparing predictions based on the same dataset and partition while still allowing you to train champion and challenger models on different datasets. For example, you may train a challenger model on an updated snapshot of the same data source used by the champion.

!!! warning
    Make sure your comparison dataset is out-of-sample for the models being compared (i.e., it doesn't include the training data from any models included in the comparison).

### Generate model comparisons {: #generate-model-comparisons }

After you [enable challengers](challengers#enable-challenger-models) and [add one or more challengers](challengers#add-challengers-to-a-deployment) to a deployment, you can generate comparison data and visualizations.

1. On the **Deployments** page, locate and expand the deployment with the champion and challenger models you want to compare.

2. Click the **Challengers** tab.

3. On the **Challengers Summary** tab, if necessary, [add a challenger model](challengers#add-challengers-to-a-deployment) and [replay the predictions](challengers#replay-predictions) for challengers.

4. Click the **Model Comparison** tab.

    The following table describes the elements of the **Model Comparison** tab:

    ![](images/challenger-model-comparison-tab.png)

    |   | Element | Description |
    |---|---------|-------------|
    |![](images/icon-1.png)| Model 1 | Defaults to the champion model&mdash;the currently deployed model. Click to select a different model to compare.|
    |![](images/icon-2.png)| Model 2 | Defaults to the first challenger model in the list. Click to select a different model to compare. If the list doesn't contain a model you want to compare to Model 1, click the **Challengers Summary** tab to add a new challenger.|
    |![](images/icon-3.png)| Open model package | Click to view the model's details. The details display in the **Model Packages** tab in the Model Registry.|
    |![](images/icon-4.png)| Promote to champion | If the challenger model in the comparison is the best model, click **Promote to champion** to replace the deployed model (the "champion") with this model.|
    |![](images/icon-5.png)| Add comparison dataset | Select a dataset for generating insights on both models. Be sure to select a dataset that is out-of-sample for both models (see [stacked predictions](data-partitioning#what-are-stacked-predictions)). Holdout and validation partitions for Model 1 and Model 2 are available as options if these partitions exist for the original model. By default, the holdout partition for Model 1 is selected. To specify a different dataset, click **+ Add comparison dataset** and choose a local file or a [snapshotted](glossary/index#snapshot) dataset from the AI Catalog.|
    |![](images/icon-6.png)| Prediction environment | Select a [prediction environment](pred-env) for scoring both models.|
    |![](images/icon-7.png)| Model Insights | Compare model predictions, metrics, and more.|

5. Scroll to the **Model Insights** section of the Challengers page and click **Compute insights**.

   You can generate new insights using a different dataset by clicking **+ Add comparison dataset**, then selecting **Compute insights** again.

### View model comparisons {: #view-model-comparisons }

Once you compute model insights, the **Model Insights** page displays the following tabs depending on the project type:

!!! note
    Multiclass classification projects only support accuracy comparison.

<table>
  <tr>
    <th></th>
    <th scope="col">Accuracy</th>
    <th scope="col">Dual lift</th>
    <th scope="col">Lift</th>
    <th scope="col">ROC</th>
    <th scope="col">Predictions Difference</th>
  </tr>
  <tr>
    <th scope="row">Regression</th>
    <td>✔</td>
    <td>✔</td>
    <td>✔</td>
    <td></td>
    <td>✔</td>
  </tr>
  <tr>
    <th scope="row">Binary</th>
    <td>✔</td>
    <td>✔</td>
    <td>✔</td>
    <td>✔</td>
    <td>✔</td>
  </tr>
    <tr>
    <th scope="row">Multiclass</th>
    <td>✔</td>
    <td></td>
    <td></td>
    <td></td>
    <td></td>
  </tr>
  <tr>
    <th scope="row">Time series</th>
    <td>✔</td>
    <td>✔</td>
    <td>✔</td>
    <td></td>
    <td>✔</td>
  </tr>
</table>


=== "Accuracy"

    After DataRobot computes model insights for the deployment, you can compare model accuracy.

    Under **Model Insights**, click the **Accuracy** tab to compare accuracy metrics:

    ![](images/challenger-compare-accuracy.png)

    The two columns show the metrics for each model. Highlighted numbers represent favorable values. In this example, the champion, **Model 1**, outperforms **Model 2** for most metrics shown.

    For time series projects, you can evaluate accuracy metrics by applying the following filters:

    * **Forecast distance**: View accuracy for the selected [forecast distance](glossary/index#forecast-distance) row within the [forecast window](glossary/index#forecast-window) range.

    * **For all *x* series**: View accuracy scores by metric. This view reports scores in all available accuracy metrics for both models across the entire [time series](glossary/index#time-series) range (*x*).

    * **Per series**: View accuracy scores by series within a [multiseries](glossary/index#multiseries) comparison dataset. This view reports scores in a single accuracy metric (selected in the **Metric** dropdown menu) for each **Series ID** (e.g., store number) in the dataset for both models.

    For multiclass projects, you can evaluate accuracy metrics by applying the following filters:

    * **For all *x* classes**: View accuracy scores by metric. This view reports scores in all available accuracy metrics for both models across the entire [multiclass](glossary/index#classification) range (*x*).

    * **Per class**: View accuracy scores by class within a [multiclass classification](glossary/index#classification) problem. This view reports scores in a single accuracy metric (selected in the **Metric** dropdown menu) for each **Class** (e.g., buy, sell, or hold) in the dataset for both models.


=== "Dual lift"

    A [dual lift chart](model-compare#dual-lift-chart) is a visualization comparing two selected models against each other. This visualization can reveal how models underpredict or overpredict the actual values across the distribution of their predictions. The prediction data is evenly distributed into equal size bins in increasing order.

    To view the dual lift chart for the two models being compared, under **Model Insights**, click the **Dual lift** tab:

    ![](images/challenger-compare-dual-lift.png)

    The curves for the two models represented on this chart maintain the color they were assigned when added to the deployment (as either a champion or challenger). To interact with the dual lift chart, you can hide the model curves and the actual curve.

    * The **+** icons in the plot area of the chart represent the models' predicted values. Click the **+** icon next to a model name in the header to hide or show the curve for a particular model.
    * The orange <span style="color: #ff6d00">**o**</span> icons in the plot area of the chart represent the actual values. Click the orange <span style="color: #ff6d00">**o**</span> icon next to **Actual** to hide or show the curve representing the actual values.

=== "Lift"

    A [lift chart](lift-chart) depicts how well a model segments the target population and how capable it is of predicting the target, allowing you to visualize the model's effectiveness.

    To view the lift chart for the models being compared, under **Model Insights**, click the **Lift** tab:

    ![](images/challenger-compare-lift.png)

    The curves for the two models represented on this chart maintain the color they were assigned when added to the deployment (as either a champion or challenger).

=== "ROC"

    !!! note
        The ROC tab is only available for binary classification projects.

    An [ROC curve](roc-curve) plots the true-positive rate against the false-positive rate for a given data source. Use the ROC curve to explore classification, performance, and statistics for the models you're comparing.

    To view the ROC curves for the models being compared, under **Model Insights**, click the **ROC** tab:

    ![](images/challenger-compare-roc.png)

    The curves for the two models represented on this chart maintain the color they were assigned when added to the deployment (as either a champion or challenger). You can update the prediction thresholds for the models by clicking the pencil icons.

=== "Predictions Difference"

    Click the **Predictions Difference** tab to compare the predictions of two models on a row-by-row basis. The histogram shows the percentage of predictions that fall within the match threshold you specify in the **Prediction match threshold** field (along with the corresponding numbers of rows).

    The header of the histogram displays the percentage of predictions:

    * Between the positive and negative values of the match threshold (shown in green)
    * Greater than the upper (positive) match threshold (shown in red)
    * Less than the lower (negative) match threshold (shown in red)

    ![](images/challenger-compare-predictions-diff-1.png)

    ??? note "How are bin sizes calculated?"
        The size of the **Predictions Difference** bins in the histogram depends on the **Prediction match threshold** you set. The value of the prediction match threshold bin is equal to the difference between the upper match threshold (positive) and the lower match threshold (negative). The default prediction match threshold value is 0.0025, so for that value, the center bin is 0.005 (0.0025 + |-0.0025|). The bins on either side of the central bin are ten times larger than the previous bin. The last bin on either end expands to fit the full Prediction Difference range. For example, based on the default **Prediction match threshold**, the bin sizes would be as follows (where x is the difference between 250 and the maximum Prediction Difference):

        <table>
        <tr>
            <th></th>
            <th scope="col">Bin -5</th>
            <th scope="col">Bin -4</th>
            <th scope="col">Bin -3</th>
            <th scope="col">Bin -2</th>
            <th scope="col">Bin -1</th>
            <th scope="col">Bin 0</th>
            <th scope="col">Bin 1</th>
            <th scope="col">Bin 2</th>
            <th scope="col">Bin 3</th>
            <th scope="col">Bin 4</th>
            <th scope="col">Bin 5</th>
        </tr>
        <tr>
            <th scope="row">Range</th>
            <td>(−250 + x) to −25</td>
            <td>−25 to −2.5</td>
            <td>−2.5 to −0.25</td>
            <td>−0.25 to −0.025</td>
            <td>−0.025 to −0.0025</td>
            <td>−0.0025 to +0.0025</td>
            <td>+0.0025 to +0.025</td>
            <td>+0.025 to +0.25</td>
            <td>+0.25 to +2.5</td>
            <td>+2.5 to +25</td>
            <td>+25 to (+250 + x)</td>
        </tr>
        <tr>
            <th scope="row">Size</th>
            <td>225 + x</td>
            <td>22.5</td>
            <td>2.25</td>
            <td>0.225</td>
            <td>0.0225</td>
            <td>0.005</td>
            <td>0.0225</td>
            <td>0.225</td>
            <td>2.25</td>
            <td>22.5</td>
            <td>225 + x</td>
        </tr>
        </table>

    If many matches dilute the histogram, you can toggle **Scale y-axis to ignore perfect matches** to focus on the mismatches.

    The bottom section of the **Predictions Difference** tab shows the 1000 most divergent predictions (in terms of absolute value).

    ![](images/challenger-compare-predictions-diff-2.png)

    The **Difference** column shows how far apart the predictions are.


### Replace champion with challenger

After comparing models, if you find a model that outperforms the deployed model, you can set it as the new champion.

1. Evaluate the comparison model insights to determine the best-performing model.

2. If a challenger model outperforms the deployed model, click **Promote to champion**.

3. Select a **Replacement Reason** and click **Accept and Replace**.

   ![](images/challenger-replace-model.png)

   The challenger model is now the champion (deployed) model.

## Challengers for external deployments {: #challengers-for-external-deployments }

External deployments with [remote prediction environments](pred-env) can also use the **Challengers** tab. Remote models can serve as the champion model, and you can compare them to DataRobot and custom models serving as challengers.

The [workflow](challengers#enable-challenger-models) for adding challenger models is largely the same; however, there are unique differences for external deployments outlined below.

###  Add challenger models to external deployments {: #add-challenger-models-to-external-deployments }

To enable challenger support, access an external deployment (one created with an external model package). In the **Settings** tab, under the **Data Drift** header, enable challenger models and [prediction row storage](challengers-settings).

![](images/challenge-1.png)

The **Challengers** tab is now accessible. To add challenger models to the deployment, navigate to the tab and click **Add challenger model > Select existing model**.

![](images/add-challenger.png)

Select a model package for the challenger you want to add (custom and DataRobot models only). Additionally, you must indicate a prediction environment used by the model package; this details where the model runs predictions. DataRobot or custom model can only use a DataRobot prediction environment for challengers models (unlike the champion model, deployed to an external prediction environment). When you have chosen the desired prediction environment, click **Select**.

![](images/ext-champ-3.png)

The tab updates to display the model package you wish to add, verifying that the features used in the model package match the deployed model. Select **Add challenger**.

![](images/ext-champ-4.png)

The model package is now serving as a challenger model for the remote deployment.

### Add external challenger comparison dataset {: #add-external-challenger-comparison-dataset }

To compare an external model challenger, you need to provide a dataset that includes the actuals *and* the prediction results. When you upload the comparison dataset, you can specify a column containing the prediction results.

To add a comparison dataset for an external model challenger, follow the [Generate model comparisons](#generate-model-comparions) process, and on the **Model Comparison** tab, upload your comparison dataset with a **Prediction column** identifier. Make sure the prediction dataset you provide includes the prediction results generated by the external model at the location identified by the **Prediction column**.

![](images/ext-champ-6.png)

### Manage challengers for external deployments {: #manage-challengers-for-external-deployments }

You can manage challenger models for remote deployments with various actions:

* To edit the prediction environment used by a challenger, select the pencil icon and choose a new prediction environment from the dropdown.

* To replace the deployed model with a challenger, the challenger must have a compatible prediction environment. Once replaced, the champion <em>does not</em> become a challenger because remote models are ineligible.

#### Challenger promotion to champion {: #challenger-promotion-to-champion}

A deployment's champion can't switch between an external prediction environment and a DataRobot prediction environment. When a challenger replaces a champion running in an external prediction environment, that challenger inherits the external environment of the former champion. If the Management Agent isn't configured in the external prediction environment, you must manually deploy the new champion in the external environment to continue making predictions.

#### Champion demotion to challenger {: #champion-demotion-to-challenger}

If the former champion isn't an external model package, it is compatible with DataRobot hosting and can become a challenger. In that scenario, the former champion moves to a DataRobot prediction environment where the deployment can replay the champion's predictions against it.
